computer science & information technology
Enhancing software product lines with machine learning components
Cobaleda, Luz-Viviana, Carvajal, Julián, Vallejo, Paola, López, Andrés, Mazo, Raúl
Modern software systems increasingly integrate machine learning (ML) due to its advancements and ability to enhance data-driven decision-making. However, this integration introduces significant challenges for software engineering, especially in software product lines (SPLs), where managing variability and reuse becomes more complex with the inclusion of ML components. Although existing approaches have addressed variability management in SPLs and the integration of ML components in isolated systems, few have explored the intersection of both domains. Specifically, there is limited support for modeling and managing variability in SPLs that incorporate ML components. To bridge this gap, this article proposes a structured framework designed to extend Software Product Line engineering, facilitating the integration of ML components. It facilitates the design of SPLs with ML capabilities by enabling systematic modeling of variability and reuse. The proposal has been partially implemented with the VariaMos tool.
- North America > United States > New York > New York County > New York City (0.04)
- South America > Colombia > Antioquia Department > Medellín (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (3 more...)
- Education > Educational Setting (0.67)
- Information Technology > Security & Privacy (0.46)
LLMs Between the Nodes: Community Discovery Beyond Vectors
Community detection in social network graphs plays a vital role in uncovering group dynamics, influence pathways, and the spread of information. Traditional methods focus primarily on graph structural properties, but recent advancements in Large Language Models (LLMs) open up new avenues for integrating semantic and contextual information into this task. In this paper, we present a detailed investigation into how various LLM-based approaches perform in identifying communities within social graphs. We introduce a two-step framework called CommLLM, which leverages the GPT-4o model along with prompt-based reasoning to fuse language model outputs with graph structure. Evaluations are conducted on six real-world social network datasets, measuring performance using key metrics such as Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), Variation of Information (VOI), and cluster purity. Our findings reveal that LLMs, particularly when guided by graph-aware strategies, can be successfully applied to community detection tasks in small to medium-sized graphs. We observe that the integration of instruction-tuned models and carefully engineered prompts significantly improves the accuracy and coherence of detected communities. These insights not only highlight the potential of LLMs in graph-based research but also underscore the importance of tailoring model interactions to the specific structure of graph data.
- North America > United States > Texas > Dallas County > Dallas (0.04)
- North America > United States > California > Riverside County > Riverside (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Asia > India (0.04)
Behavior-Specific Filtering for Enhanced Pig Behavior Classification in Precision Livestock Farming
Zhang, Zhen, Ha, Dong Sam, Morota, Gota, Shin, Sook
Precision Livestock Farming (PLF) has emerged as a critical field for monitoring and improving animal health and behavior[1]. Accurate and continuous tracking of livestock behavior is essential for identifying early signs of health issues an d enabling timely intervention. Traditional methods for monitoring pig behavior, such as manual observation, are labor - intensive, limited in scalability, and prone to inaccuracies [2]. Recent advancements in PLF have introduced automated systems that lev erage biosensors to track behavior in real time. These sensors, often attached to animals, collect data that is both costeffective and reliable, making them indispensable for modern livestock management [3,4].
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > Virginia > Montgomery County > Blacksburg (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (3 more...)
- Health & Medicine (1.00)
- Information Technology (0.68)
- Food & Agriculture > Agriculture (0.50)
Compilation, Optimization, Error Mitigation, and Machine Learning in Quantum Algorithms
Wang, Shuangbao Paul, Mao, Jianzhou, Sakk, Eric
This paper discusses the compilation, optimization, and error mitigation of quantum algorithms, essential steps to execute real-world quantum algorithms. Quantum algorithms running on a hybrid platform with QPU and CPU/GPU take advantage of existing high-performance computing power with quantum-enabled exponential speedups. The proposed approximate quantum Fourier transform (AQFT) for quantum algorithm optimization improves the circuit execution on top of an exponential speed-ups the quantum Fourier transform has provided.
Multi-Domain ABSA Conversation Dataset Generation via LLMs for Real-World Evaluation and Model Comparison
Pandit, Tejul, Raval, Meet, Upadhyay, Dhvani
Aspect-Based Sentiment Analysis (ABSA) offers granular insights into opinions but often suffers from the scarcity of diverse, labeled datasets that reflect real-world conversational nuances. This paper presents an approach for generating synthetic ABSA data using Large Language Models (LLMs) to address this gap. We detail the generation process aimed at producing data with consistent topic and sentiment distributions across multiple domains using GPT-4o. The quality and utility of the generated data were evaluated by assessing the performance of three state-of-the-art LLMs (Gemini 1.5 Pro, Claude 3.5 Sonnet, and DeepSeek-R1) on topic and sentiment classification tasks. Our results demonstrate the effectiveness of the synthetic data, revealing distinct performance trade-offs among the models: DeepSeek-R1 showed higher precision, Gemini 1.5 Pro and Claude 3.5 Sonnet exhibited strong recall, and Gemini 1.5 Pro offered significantly faster inference. We conclude that LLM-based synthetic data generation is a viable and flexible method for creating valuable ABSA resources, facilitating research and model evaluation without reliance on limited or inaccessible real-world labeled data.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > New York (0.04)
- North America > United States > Colorado > Denver County > Denver (0.04)
- (4 more...)
A Domain Ontology for Modeling the Book of Purification in Islam
This paper aims to address a gap in major Islamic topics by developing an ontology for the Book of Purification in Islam. Many authoritative Islamic texts begin with the Book of Purification, as it is essential for performing prayer (the second pillar of Islam after Shahadah, the profession of faith) and other religious duties such as Umrah and Hajj. The ontology development strategy followed six key steps: (1) domain identification, (2) knowledge acquisition, (3) conceptualization, (4) classification, (5) integration and implementation, and (6) ontology generation. This paper includes examples of the constructed tables and classifications. The focus is on the design and analysis phases, as technical implementation is beyond the scope of this study. However, an initial implementation is provided to illustrate the steps of the proposed strategy. The developed ontology ensures reusability by formally defining and encoding the key concepts, attributes, and relationships related to the Book of Purification. This structured representation is intended to support knowledge sharing and reuse.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Asia > Middle East > Saudi Arabia > Riyadh Province > Riyadh (0.04)
Optimizing Retrieval-Augmented Generation for Electrical Engineering: A Case Study on ABB Circuit Breakers
Alawadhi, Salahuddin, Abbas, Noorhan
Integrating Retrieval Augmented Generation (RAG) with Large Language Models (LLMs) has shown the potential to provide precise, contextually relevant responses in knowledge intensive domains. This study investigates th e ap - plication of RAG for ABB circuit breakers, focusing on accuracy, reliability, and contextual relevance in high - stakes engineering environments. By leveraging tailored datasets, advanced embedding models, and optimized chunking strategies, the research addresses challenges in data retrieval and contextual alignment unique to engineering documentation. Key contributions include the development of a domain - specific dataset for ABB circuit breakers and the evaluation of three RAG pipelines: OpenAI GPT4o, C ohere, and Anthropic Claude. Advanced chunking methods, such as paragraph - based and title - aware segmentation, are assessed for their impact on retrieval accuracy and response generation. Results demonstrate that while certain configurations achieve high pr ecision and relevancy, limitations persist in ensuring factual faithfulness and completeness, critical in engineering contexts. This work underscores the need for iterative improvements in RAG systems to meet the stringent demands of electrical engineering tasks, including design, troubleshooting, and operational decision - making. The findings in this paper help advance research of AI in highly technical domains such as electrical engineering. Electrical engineering is a cornerstone of modern infrastructure, underpin n ing systems that power cities, enable communication, and drive technological innovation. From power generation and distribution to the design of advanced electronic systems, electrical engineering plays a vital role in ensuring the reliability, efficiency, and safety of critical infrastructure [1]. Mistakes or inaccuracies in the design, operation, or maintenance of e lectrical systems can have far - reaching consequences, including equipment failure, financial losses, and risks to public safety. In such high - stakes environments, precision and reliability in accessing accurate technical information are paramount [2]. Sim ilarly, in medicine, iterative retrieval methods have been proposed to enhance the accuracy of RAG systems. Xiong et al. [3] introduced the i - MedRAG system, which dynamically generates follow - up queries to refine responses. This approach improved retrieval accuracy and generalizability, although it incurred higher computational costs.
- Europe > United Kingdom > England > West Yorkshire > Leeds (0.04)
- Asia > Middle East > UAE > Dubai Emirate > Dubai (0.04)
- Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)
Surfacing Semantic Orthogonality Across Model Safety Benchmarks: A Multi-Dimensional Analysis
Bennion, Jonathan, Ghosh, Shaona, Singh, Mantek, Dziri, Nouha
Various AI safety datasets have been developed to measure LLMs against evolving interpretations of harm. Our evaluation of five recently published open-source safety benchmarks reveals distinct semantic clusters using UMAP dimensionality reduction and kmeans clustering (silhouette score: 0.470). We identify six primary harm categories with varying benchmark representation. GretelAI, for example, focuses heavily on privacy concerns, while WildGuardMix emphasizes self-harm scenarios. Significant differences in prompt length distribution suggests confounds to data collection and interpretations of harm as well as offer possible context. Our analysis quantifies benchmark orthogonality among AI benchmarks, allowing for transparency in coverage gaps despite topical similarities. Our quantitative framework for analyzing semantic orthogonality across safety benchmarks enables more targeted development of datasets that comprehensively address the evolving landscape of harms in AI use, however that is defined in the future.
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
MCP Guardian: A Security-First Layer for Safeguarding MCP-Based AI System
Kumar, Sonu, Girdhar, Anubhav, Patil, Ritesh, Tripathi, Divyansh
As Agentic AI gain mainstream adoption, the industry invests heavily in model capabilities, achieving rapid leaps in reasoning and quality. However, these systems remain largely confined to data silos, and each new integration requires custom logic that is difficult to scale. The Model Context Protocol (MCP) addresses this challenge by defining a universal, open standard for securely connecting AI-based applications (MCP clients) to data sources (MCP servers). However, the flexibility of the MCP introduces new risks, including malicious tool servers and compromised data integrity. We present MCP Guardian, a framework that strengthens MCP-based communication with authentication, rate-limiting, logging, tracing, and Web Application Firewall (WAF) scanning. Through real-world scenarios and empirical testing, we demonstrate how MCP Guardian effectively mitigates attacks and ensures robust oversight with minimal overheads. Our approach fosters secure, scalable data access for AI assistants, underscoring the importance of a defense-in-depth approach that enables safer and more transparent innovation in AI-driven environments.
- North America > United States (0.04)
- Asia > India > Uttarakhand > Roorkee (0.04)
Adaptive Token Boundaries: Integrating Human Chunking Mechanisms into Multimodal LLMs
Recent advancements in multimodal large language models (MLLMs) have demonstrated remarkable capabilities in processing diverse data types, yet significant disparities persist between human cognitive processes and computational approaches to multimodal information integration. This research presents a systematic investigation into the parallels between human cross-modal chunking mechanisms and token representation methodologies in MLLMs. Through empirical studies comparing human performance patterns with model behaviors across visual-linguistic tasks, we demonstrate that conventional static tokenization schemes fundamentally constrain current models' capacity to simulate the dynamic, context-sensitive nature of human information processing. We propose a novel framework for dynamic cross-modal tokenization that incorporates adaptive boundaries, hierarchical representations, and alignment mechanisms grounded in cognitive science principles. Quantitative evaluations demonstrate that our approach yields statistically significant improvements over state-of-the-art models on benchmark tasks (+7.8% on Visual Question Answering, +5.3% on Complex Scene Description) while exhibiting more human-aligned error patterns and attention distributions. These findings contribute to the theoretical understanding of the relationship between human cognition and artificial intelligence, while providing empirical evidence for developing more cognitively plausible AI systems.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.95)
- Education (1.00)
- Health & Medicine > Health Care Technology (0.69)
- Health & Medicine > Therapeutic Area > Neurology (0.69)
- Health & Medicine > Diagnostic Medicine > Imaging (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)